Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors
نویسندگان
چکیده
Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms.
منابع مشابه
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors based on the efficient implementation of a segmented sum operation. We describe how the segmented sum can be implemented on vector multiprocessors such that it both fully vectorizes within each processor and parallelizes across processors. Because of our method’s insensitivity to relative row siz...
متن کاملBreaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
The low utilization of SIMD units and memory bandwidth is the main performance bottleneck on SIMD processors for sparse matrix-vector multiplication (SpMV), which is one of the most important kernels in many scientific and engineering applications. This paper proposes a hybrid optimization method to break the performance bottleneck of SpMV on SIMD processors. The method includes a new sparse ma...
متن کاملA Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units | SIAM Journal on Scientific Computing | Vol. 36, No. 5 | Society for Industrial and Applied Mathematics
Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instru...
متن کاملVector ISA Extension for Sparse Matrix-Vector Multiplication
In this paper we introduce a vector ISA extension to facilitate sparse matrix manipulation on vector processors (VPs). First we introduce a new Block Based Compressed Storage (BBCS) format for sparse matrix representation and a Block-wise Sparse Matrix-Vector Multiplication approach. Additionally, we propose two vector instructions, Multiple Inner Product and Accumulate (MIPA) and LoaD Section ...
متن کاملA Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units
Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instru...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Parallel Computing
دوره 49 شماره
صفحات -
تاریخ انتشار 2015